{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/vector_stores/CognitiveSearchIndexDemo.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Azure AI Search"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Basic Example\n",
    "\n",
    "In this notebook, we take a Paul Graham essay, split it into chunks, embed it using an Azure OpenAI embedding model, load it into an Azure AI Search index, and then query it."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama-index\n",
    "!pip install wget"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import logging\n",
    "import sys\n",
    "from azure.core.credentials import AzureKeyCredential\n",
    "from azure.search.documents import SearchClient\n",
    "from azure.search.documents.indexes import SearchIndexClient\n",
    "from IPython.display import Markdown, display\n",
    "from llama_index import (\n",
    "    ServiceContext,\n",
    "    SimpleDirectoryReader,\n",
    "    StorageContext,\n",
    "    VectorStoreIndex,\n",
    ")\n",
    "from llama_index.embeddings import AzureOpenAIEmbedding\n",
    "from llama_index.llms import AzureOpenAI\n",
    "from llama_index.vector_stores import AzureAISearchVectorStore\n",
    "from llama_index.vector_stores.azureaisearch import (\n",
    "    IndexManagement,\n",
    "    MetadataIndexFieldType,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup Azure OpenAI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "aoai_api_key = \"YOUR_AZURE_OPENAI_API_KEY\"\n",
    "aoai_endpoint = \"YOUR_AZURE_OPENAI_ENDPOINT\"\n",
    "aoai_api_version = \"2023-05-15\"\n",
    "\n",
    "llm = AzureOpenAI(\n",
    "    model=\"YOUR_AZURE_OPENAI_COMPLETION_MODEL_NAME\",\n",
    "    deployment_name=\"YOUR_AZURE_OPENAI_COMPLETION_DEPLOYMENT_NAME\",\n",
    "    api_key=aoai_api_key,\n",
    "    azure_endpoint=aoai_endpoint,\n",
    "    api_version=aoai_api_version,\n",
    ")\n",
    "\n",
    "# You need to deploy your own embedding model as well as your own chat completion model\n",
    "embed_model = AzureOpenAIEmbedding(\n",
    "    model=\"YOUR_AZURE_OPENAI_EMBEDDING_MODEL_NAME\",\n",
    "    deployment_name=\"YOUR_AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME\",\n",
    "    api_key=aoai_api_key,\n",
    "    azure_endpoint=aoai_endpoint,\n",
    "    api_version=aoai_api_version,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup Azure AI Search"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "search_service_api_key = \"YOUR-AZURE-SEARCH-SERVICE-ADMIN-KEY\"\n",
    "search_service_endpoint = \"YOUR-AZURE-SEARCH-SERVICE-ENDPOINT\"\n",
    "search_service_api_version = \"2023-11-01\"\n",
    "credential = AzureKeyCredential(search_service_api_key)\n",
    "\n",
    "\n",
    "# Index name to use\n",
    "index_name = \"llamaindex-vector-demo\"\n",
    "\n",
    "# Use index client to demonstrate creating an index\n",
    "index_client = SearchIndexClient(\n",
    "    endpoint=search_service_endpoint,\n",
    "    credential=credential,\n",
    ")\n",
    "\n",
    "# Use search client to demonstration using existing index\n",
    "search_client = SearchClient(\n",
    "    endpoint=search_service_endpoint,\n",
    "    index_name=index_name,\n",
    "    credential=credential,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create Index (if it does not exist)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Demonstrates creating a vector index named \"llamaindex-vector-demo\" if one doesn't exist. The index has the following fields:\n",
    "| Field Name | OData Type                |  \n",
    "|------------|---------------------------|  \n",
    "| id         | `Edm.String`              |  \n",
    "| chunk      | `Edm.String`              |  \n",
    "| embedding  | `Collection(Edm.Single)`  |  \n",
    "| metadata   | `Edm.String`              |  \n",
    "| doc_id     | `Edm.String`              |  \n",
    "| author     | `Edm.String`              |  \n",
    "| theme      | `Edm.String`              |  \n",
    "| director   | `Edm.String`              |  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "metadata_fields = {\n",
    "    \"author\": \"author\",\n",
    "    \"theme\": (\"topic\", MetadataIndexFieldType.STRING),\n",
    "    \"director\": \"director\",\n",
    "}\n",
    "\n",
    "vector_store = AzureAISearchVectorStore(\n",
    "    search_or_index_client=index_client,\n",
    "    filterable_metadata_field_keys=metadata_fields,\n",
    "    index_name=index_name,\n",
    "    index_management=IndexManagement.CREATE_IF_NOT_EXISTS,\n",
    "    id_field_key=\"id\",\n",
    "    chunk_field_key=\"chunk\",\n",
    "    embedding_field_key=\"embedding\",\n",
    "    embedding_dimensionality=1536,\n",
    "    metadata_string_field_key=\"metadata\",\n",
    "    doc_id_field_key=\"doc_id\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir -p 'data/paul_graham/'\n",
    "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Loading documents\n",
    "Load the documents stored in the `data/paul_graham/` using the SimpleDirectoryReader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load documents\n",
    "documents = SimpleDirectoryReader(\"../data/paul_graham/\").load_data()\n",
    "\n",
    "storage_context = StorageContext.from_defaults(vector_store=vector_store)\n",
    "service_context = ServiceContext.from_defaults(\n",
    "    llm=llm, embed_model=embed_model\n",
    ")\n",
    "index = VectorStoreIndex.from_documents(\n",
    "    documents, storage_context=storage_context, service_context=service_context\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<b>The author engaged in writing and programming activities during their formative years. They initially wrote short stories and later transitioned to programming on the IBM 1401 using an early version of Fortran. Subsequently, with the advent of microcomputers, the author began programming on a TRS-80, writing simple games, a rocket flight prediction program, and a word processor.</b>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Query Data\n",
    "query_engine = index.as_query_engine(similarity_top_k=3)\n",
    "response = query_engine.query(\"What did the author do growing up?\")\n",
    "display(Markdown(f\"<b>{response}</b>\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<b>The author learned that the study of philosophy in college did not live up to their expectations, as they found the courses to be boring and lacking in ultimate truths. This led them to switch their focus to AI, which was influenced by a novel featuring an intelligent computer and a PBS documentary showcasing advanced technology.</b>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "response = query_engine.query(\n",
    "    \"What did the author learn?\",\n",
    ")\n",
    "display(Markdown(f\"<b>{response}</b>\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Use Existing Index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "index_name = \"llamaindex-vector-demo\"\n",
    "\n",
    "metadata_fields = {\n",
    "    \"author\": \"author\",\n",
    "    \"theme\": (\"topic\", MetadataIndexFieldType.STRING),\n",
    "    \"director\": \"director\",\n",
    "}\n",
    "vector_store = AzureAISearchVectorStore(\n",
    "    search_or_index_client=search_client,\n",
    "    filterable_metadata_field_keys=metadata_fields,\n",
    "    index_management=IndexManagement.VALIDATE_INDEX,\n",
    "    id_field_key=\"id\",\n",
    "    chunk_field_key=\"chunk\",\n",
    "    embedding_field_key=\"embedding\",\n",
    "    embedding_dimensionality=1536,\n",
    "    metadata_string_field_key=\"metadata\",\n",
    "    doc_id_field_key=\"doc_id\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "storage_context = StorageContext.from_defaults(vector_store=vector_store)\n",
    "service_context = ServiceContext.from_defaults(\n",
    "    llm=llm, embed_model=embed_model\n",
    ")\n",
    "index = VectorStoreIndex.from_documents(\n",
    "    [], storage_context=storage_context, service_context=service_context\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<b>The author faced a challenging moment when he couldn't figure out what to do with the early computer he had access to in 9th grade. This was due to the limited options for input and the lack of knowledge in math to do anything interesting with the available resources.</b>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "query_engine = index.as_query_engine()\n",
    "response = query_engine.query(\"What was a hard moment for the author?\")\n",
    "display(Markdown(f\"<b>{response}</b>\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<b>Paul Graham</b>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "response = query_engine.query(\"Who is the author?\")\n",
    "display(Markdown(f\"<b>{response}</b>\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The author worked at Interleaf, where they learned several lessons, including the importance of product-focused leadership in technology companies, the drawbacks of code being edited by too many people, the limitations of conventional office hours for optimal hacking, and the risks associated with bureaucratic customers. Additionally, the author discovered the concept that the low end tends to dominate the high end, and that being the \"entry level\" option can be advantageous.\n",
      "\n",
      "Streamed output at 99.40073103089465 tokens/s\n"
     ]
    }
   ],
   "source": [
    "import time\n",
    "\n",
    "query_engine = index.as_query_engine(streaming=True)\n",
    "response = query_engine.query(\"What happened at interleaf?\")\n",
    "\n",
    "start_time = time.time()\n",
    "\n",
    "token_count = 0\n",
    "for token in response.response_gen:\n",
    "    print(token, end=\"\")\n",
    "    token_count += 1\n",
    "\n",
    "time_elapsed = time.time() - start_time\n",
    "tokens_per_second = token_count / time_elapsed\n",
    "\n",
    "print(f\"\\n\\nStreamed output at {tokens_per_second} tokens/s\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Adding a document to existing index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<b>Blue</b>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "response = query_engine.query(\"What colour is the sky?\")\n",
    "display(Markdown(f\"<b>{response}</b>\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index import Document\n",
    "\n",
    "index.insert_nodes([Document(text=\"The sky is indigo today\")])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "<b>The sky is indigo today.</b>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "response = query_engine.query(\"What colour is the sky?\")\n",
    "display(Markdown(f\"<b>{response}</b>\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Filtering"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.schema import TextNode\n",
    "\n",
    "nodes = [\n",
    "    TextNode(\n",
    "        text=\"The Shawshank Redemption\",\n",
    "        metadata={\n",
    "            \"author\": \"Stephen King\",\n",
    "            \"theme\": \"Friendship\",\n",
    "        },\n",
    "    ),\n",
    "    TextNode(\n",
    "        text=\"The Godfather\",\n",
    "        metadata={\n",
    "            \"director\": \"Francis Ford Coppola\",\n",
    "            \"theme\": \"Mafia\",\n",
    "        },\n",
    "    ),\n",
    "    TextNode(\n",
    "        text=\"Inception\",\n",
    "        metadata={\n",
    "            \"director\": \"Christopher Nolan\",\n",
    "        },\n",
    "    ),\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "index.insert_nodes(nodes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[NodeWithScore(node=TextNode(id_='049f00de-13be-4af3-ab56-8c16352fe799', embedding=None, metadata={'director': 'Francis Ford Coppola', 'theme': 'Mafia'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='ad2a08d4364262546db9711b915348d43e0ccc41bd8c3c41775e133624e1fa1b', text='The Godfather', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'), score=0.8120511)]"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from llama_index.vector_stores.types import ExactMatchFilter, MetadataFilters\n",
    "\n",
    "\n",
    "filters = MetadataFilters(\n",
    "    filters=[ExactMatchFilter(key=\"theme\", value=\"Mafia\")]\n",
    ")\n",
    "\n",
    "retriever = index.as_retriever(filters=filters)\n",
    "retriever.retrieve(\"What is inception about?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Query Mode\n",
    "Four query modes are supported: DEFAULT (vector search), SPARSE, HYBRID, and SEMANTIC_HYBRID."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Perform a Vector Search"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Score: 0.8748552\n",
      "File Name: Unknown\n",
      "Id: bae0df75-ff37-4725-b659-b9fd8bf2ef3c\n",
      "\n",
      "Extracted Content:\n",
      "Inception\n",
      "\n",
      "======================================== End of Result ========================================\n",
      "\n",
      "Score: 0.8155207\n",
      "File Name: paul_graham_essay.txt\n",
      "Id: ae5aee85-a083-4141-bf75-bbb872f53760\n",
      "\n",
      "Extracted Content:\n",
      "It's not that unprestigious types of work are good per se. But when you find yourself drawn to some kind of work despite its current lack of prestige, it's a sign both that there's something real to be discovered there, and that you have the right kind of motives. Impure motives are a big danger for the ambitious. If anything is going to lead you astray, it will be the desire to impress people. So while working on things that aren't prestigious doesn't guarantee you're on the right track, it at least guarantees you're not on the most common type of wrong one.\n",
      "\n",
      "Over the next several years I wrote lots of essays about all kinds of different topics. O'Reilly reprinted a collection of them as a book, called Hackers & Painters after one of the essays in it. I also worked on spam filters, and did some more painting. I used to have dinners for a group of friends every thursday night, which taught me how to cook for groups. And I bought another building in Cambridge, a former candy factory (and later, twas said, porn studio), to use as an office.\n",
      "\n",
      "One night in October 2003 there was a big party at my house. It was a clever idea of my friend Maria Daniels, who was one of the thursday diners. Three separate hosts would all invite their friends to one party. So for every guest, two thirds of the other guests would be people they didn't know but would probably like. One of the guests was someone I didn't know but would turn out to like a lot: a woman called Jessica Livingston. A couple days later I asked her out.\n",
      "\n",
      "Jessica was in charge of marketing at a Boston investment bank. This bank thought it understood startups, but over the next year, as she met friends of mine from the startup world, she was surprised how different reality was. And how colorful their stories were. So she decided to compile a book of interviews with startup founders.\n",
      "\n",
      "When the bank had financial problems and she had to fire half her staff, she started looking for a new job. In early 2005 she interviewed for a marketing job at a Boston VC firm. It took them weeks to make up their minds, and during this time I started telling her about all the things that needed to be fixed about venture capital. They should make a larger number of smaller investments instead of a handful of giant ones, they should be funding younger, more technical founders instead of MBAs, they should let the founders remain as CEO, and so on.\n",
      "\n",
      "One of my tricks for writing essays had always been to give talks. The prospect of having to stand up in front of a group of people and tell them something that won't waste their time is a great spur to the imagination. When the Harvard Computer Society, the undergrad computer club, asked me to give a talk, I decided I would tell them how to start a startup. Maybe they'd be able to avoid the worst of the mistakes we'd made.\n",
      "\n",
      "So I gave this talk, in the course of which I told them that the best sources of seed funding were successful startup founders, because then they'd be sources of advice too. Whereupon it seemed they were all looking expectantly at me. Horrified at the prospect of having my inbox flooded by business plans (if I'd only known), I blurted out \"But not me!\" and went on with the talk. But afterward it occurred to me that I should really stop procrastinating about angel investing. I'd been meaning to since Yahoo bought us, and now it was 7 years later and I still hadn't done one angel investment.\n",
      "\n",
      "Meanwhile I had been scheming with Robert and Trevor about projects we could work on together. I missed working with them, and it seemed like there had to be something we could collaborate on.\n",
      "\n",
      "As Jessica and I were walking home from dinner on March 11, at the corner of Garden and Walker streets, these three threads converged. Screw the VCs who were taking so long to make up their minds. We'd start our own investment firm and actually implement the ideas we'd been talking about. I'd fund it, and Jessica could quit her job and work for it, and we'd get Robert and Trevor as partners too. [13]\n",
      "\n",
      "Once again, ignorance worked in our favor. We had no idea how to be angel investors, and in Boston in 2005 there were no Ron Conways to learn from. So we just made what seemed like the obvious choices, and some of the things we did turned out to be novel.\n",
      "\n",
      "There are multiple components to Y Combinator, and we didn't figure them all out at once. The part we got first was to be an angel firm.\n",
      "\n",
      "======================================== End of Result ========================================\n",
      "\n"
     ]
    }
   ],
   "source": [
    "from llama_index.vector_stores.types import VectorStoreQueryMode\n",
    "\n",
    "default_retriever = index.as_retriever(\n",
    "    vector_store_query_mode=VectorStoreQueryMode.DEFAULT\n",
    ")\n",
    "response = default_retriever.retrieve(\"What is inception about?\")\n",
    "\n",
    "# Loop through each NodeWithScore in the response\n",
    "for node_with_score in response:\n",
    "    node = node_with_score.node  # The TextNode object\n",
    "    score = node_with_score.score  # The similarity score\n",
    "    chunk_id = node.id_  # The chunk ID\n",
    "\n",
    "    # Extract the relevant metadata from the node\n",
    "    file_name = node.metadata.get(\"file_name\", \"Unknown\")\n",
    "    file_path = node.metadata.get(\"file_path\", \"Unknown\")\n",
    "\n",
    "    # Extract the text content from the node\n",
    "    text_content = node.text if node.text else \"No content available\"\n",
    "\n",
    "    # Print the results in a user-friendly format\n",
    "    print(f\"Score: {score}\")\n",
    "    print(f\"File Name: {file_name}\")\n",
    "    print(f\"Id: {chunk_id}\")\n",
    "    print(\"\\nExtracted Content:\")\n",
    "    print(text_content)\n",
    "    print(\"\\n\" + \"=\" * 40 + \" End of Result \" + \"=\" * 40 + \"\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Perform a Hybrid Search"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[NodeWithScore(node=TextNode(id_='bae0df75-ff37-4725-b659-b9fd8bf2ef3c', embedding=None, metadata={'director': 'Christopher Nolan'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='9792a1fd7d2e1a08f1b1d70a597357bb6b68d69ed5685117eaa37ac9e9a3565e', text='Inception', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'), score=0.03181818127632141),\n",
       " NodeWithScore(node=TextNode(id_='ae5aee85-a083-4141-bf75-bbb872f53760', embedding=None, metadata={'file_path': '..\\\\data\\\\paul_graham\\\\paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain', 'file_size': 75395, 'creation_date': '2023-12-12', 'last_modified_date': '2023-12-12', 'last_accessed_date': '2024-02-02'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='627552ee-116a-4132-a7d3-7e7232f75866', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'file_path': '..\\\\data\\\\paul_graham\\\\paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain', 'file_size': 75395, 'creation_date': '2023-12-12', 'last_modified_date': '2023-12-12', 'last_accessed_date': '2024-02-02'}, hash='0a59e1ce8e50a67680a5669164f79e524087270ce183a3971fcd18ac4cad1fa0'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='24a1d375-31e3-492c-ac02-5091e3572e3f', node_type=<ObjectType.TEXT: '1'>, metadata={'file_path': '..\\\\data\\\\paul_graham\\\\paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain', 'file_size': 75395, 'creation_date': '2023-12-12', 'last_modified_date': '2023-12-12', 'last_accessed_date': '2024-02-02'}, hash='51c474a12ac8e9748258b2c7bbe77bb7c8bf35b775ed44f016057a0aa8b0bd76'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='196569e0-2b10-4ba3-8263-a69fb78dd98c', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='192082e7ba84b8c5e2a64bd1d422c6c503189fc3ba325bb3e6e8bdb43db03fbb')}, hash='a3ea638857f1daadf7af967322480f97e1235dac3ee7d72b8024670785df8810', text='It\\'s not that unprestigious types of work are good per se. But when you find yourself drawn to some kind of work despite its current lack of prestige, it\\'s a sign both that there\\'s something real to be discovered there, and that you have the right kind of motives. Impure motives are a big danger for the ambitious. If anything is going to lead you astray, it will be the desire to impress people. So while working on things that aren\\'t prestigious doesn\\'t guarantee you\\'re on the right track, it at least guarantees you\\'re not on the most common type of wrong one.\\n\\nOver the next several years I wrote lots of essays about all kinds of different topics. O\\'Reilly reprinted a collection of them as a book, called Hackers & Painters after one of the essays in it. I also worked on spam filters, and did some more painting. I used to have dinners for a group of friends every thursday night, which taught me how to cook for groups. And I bought another building in Cambridge, a former candy factory (and later, twas said, porn studio), to use as an office.\\n\\nOne night in October 2003 there was a big party at my house. It was a clever idea of my friend Maria Daniels, who was one of the thursday diners. Three separate hosts would all invite their friends to one party. So for every guest, two thirds of the other guests would be people they didn\\'t know but would probably like. One of the guests was someone I didn\\'t know but would turn out to like a lot: a woman called Jessica Livingston. A couple days later I asked her out.\\n\\nJessica was in charge of marketing at a Boston investment bank. This bank thought it understood startups, but over the next year, as she met friends of mine from the startup world, she was surprised how different reality was. And how colorful their stories were. So she decided to compile a book of interviews with startup founders.\\n\\nWhen the bank had financial problems and she had to fire half her staff, she started looking for a new job. In early 2005 she interviewed for a marketing job at a Boston VC firm. It took them weeks to make up their minds, and during this time I started telling her about all the things that needed to be fixed about venture capital. They should make a larger number of smaller investments instead of a handful of giant ones, they should be funding younger, more technical founders instead of MBAs, they should let the founders remain as CEO, and so on.\\n\\nOne of my tricks for writing essays had always been to give talks. The prospect of having to stand up in front of a group of people and tell them something that won\\'t waste their time is a great spur to the imagination. When the Harvard Computer Society, the undergrad computer club, asked me to give a talk, I decided I would tell them how to start a startup. Maybe they\\'d be able to avoid the worst of the mistakes we\\'d made.\\n\\nSo I gave this talk, in the course of which I told them that the best sources of seed funding were successful startup founders, because then they\\'d be sources of advice too. Whereupon it seemed they were all looking expectantly at me. Horrified at the prospect of having my inbox flooded by business plans (if I\\'d only known), I blurted out \"But not me!\" and went on with the talk. But afterward it occurred to me that I should really stop procrastinating about angel investing. I\\'d been meaning to since Yahoo bought us, and now it was 7 years later and I still hadn\\'t done one angel investment.\\n\\nMeanwhile I had been scheming with Robert and Trevor about projects we could work on together. I missed working with them, and it seemed like there had to be something we could collaborate on.\\n\\nAs Jessica and I were walking home from dinner on March 11, at the corner of Garden and Walker streets, these three threads converged. Screw the VCs who were taking so long to make up their minds. We\\'d start our own investment firm and actually implement the ideas we\\'d been talking about. I\\'d fund it, and Jessica could quit her job and work for it, and we\\'d get Robert and Trevor as partners too. [13]\\n\\nOnce again, ignorance worked in our favor. We had no idea how to be angel investors, and in Boston in 2005 there were no Ron Conways to learn from. So we just made what seemed like the obvious choices, and some of the things we did turned out to be novel.\\n\\nThere are multiple components to Y Combinator, and we didn\\'t figure them all out at once. The part we got first was to be an angel firm.', start_char_idx=45670, end_char_idx=50105, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'), score=0.03009207174181938)]"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from llama_index.vector_stores.types import VectorStoreQueryMode\n",
    "\n",
    "hybrid_retriever = index.as_retriever(\n",
    "    vector_store_query_mode=VectorStoreQueryMode.HYBRID\n",
    ")\n",
    "hybrid_retriever.retrieve(\"What is inception about?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Perform a Hybrid Search with Semantic Reranking\n",
    "This mode incorporates semantic reranking to hybrid search results to improve search relevance. \n",
    "\n",
    "Please see this link for further details: https://learn.microsoft.com/azure/search/semantic-search-overview"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[NodeWithScore(node=TextNode(id_='bae0df75-ff37-4725-b659-b9fd8bf2ef3c', embedding=None, metadata={'director': 'Christopher Nolan'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='9792a1fd7d2e1a08f1b1d70a597357bb6b68d69ed5685117eaa37ac9e9a3565e', text='Inception', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'), score=2.3949906826019287),\n",
       " NodeWithScore(node=TextNode(id_='fc9782a2-c255-4265-a618-3a864abe598d', embedding=None, metadata={'file_path': '..\\\\data\\\\paul_graham\\\\paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain', 'file_size': 75395, 'creation_date': '2023-12-12', 'last_modified_date': '2023-12-12', 'last_accessed_date': '2024-02-02'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='627552ee-116a-4132-a7d3-7e7232f75866', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'file_path': '..\\\\data\\\\paul_graham\\\\paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain', 'file_size': 75395, 'creation_date': '2023-12-12', 'last_modified_date': '2023-12-12', 'last_accessed_date': '2024-02-02'}, hash='0a59e1ce8e50a67680a5669164f79e524087270ce183a3971fcd18ac4cad1fa0'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='94d87013-ea3d-4a9c-982a-dde5ff219983', node_type=<ObjectType.TEXT: '1'>, metadata={'file_path': '..\\\\data\\\\paul_graham\\\\paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain', 'file_size': 75395, 'creation_date': '2023-12-12', 'last_modified_date': '2023-12-12', 'last_accessed_date': '2024-02-02'}, hash='f28897170c6b61162069af9ee83dc11e13fa0f6bf6efaa7b3911e6ad9093da84'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='dc3852e5-4c1e-484e-9e65-f17084d3f7b4', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='deaee6d5c992dbf757876957aa9112a42d30a636c6c83d81fcfac4aaf2d24dee')}, hash='a3b31e5ec2b5d4a9b3648de310c8a5962c17afdb800ea0e16faa47956607866d', text='And at the same time all involved would adhere outwardly to the conventions of a 19th century atelier. We actually had one of those little stoves, fed with kindling, that you see in 19th century studio paintings, and a nude model sitting as close to it as possible without getting burned. Except hardly anyone else painted her besides me. The rest of the students spent their time chatting or occasionally trying to imitate things they\\'d seen in American art magazines.\\n\\nOur model turned out to live just down the street from me. She made a living from a combination of modelling and making fakes for a local antique dealer. She\\'d copy an obscure old painting out of a book, and then he\\'d take the copy and maltreat it to make it look old. [3]\\n\\nWhile I was a student at the Accademia I started painting still lives in my bedroom at night. These paintings were tiny, because the room was, and because I painted them on leftover scraps of canvas, which was all I could afford at the time. Painting still lives is different from painting people, because the subject, as its name suggests, can\\'t move. People can\\'t sit for more than about 15 minutes at a time, and when they do they don\\'t sit very still. So the traditional m.o. for painting people is to know how to paint a generic person, which you then modify to match the specific person you\\'re painting. Whereas a still life you can, if you want, copy pixel by pixel from what you\\'re seeing. You don\\'t want to stop there, of course, or you get merely photographic accuracy, and what makes a still life interesting is that it\\'s been through a head. You want to emphasize the visual cues that tell you, for example, that the reason the color changes suddenly at a certain point is that it\\'s the edge of an object. By subtly emphasizing such things you can make paintings that are more realistic than photographs not just in some metaphorical sense, but in the strict information-theoretic sense. [4]\\n\\nI liked painting still lives because I was curious about what I was seeing. In everyday life, we aren\\'t consciously aware of much we\\'re seeing. Most visual perception is handled by low-level processes that merely tell your brain \"that\\'s a water droplet\" without telling you details like where the lightest and darkest points are, or \"that\\'s a bush\" without telling you the shape and position of every leaf. This is a feature of brains, not a bug. In everyday life it would be distracting to notice every leaf on every bush. But when you have to paint something, you have to look more closely, and when you do there\\'s a lot to see. You can still be noticing new things after days of trying to paint something people usually take for granted, just as you can after days of trying to write an essay about something people usually take for granted.\\n\\nThis is not the only way to paint. I\\'m not 100% sure it\\'s even a good way to paint. But it seemed a good enough bet to be worth trying.\\n\\nOur teacher, professor Ulivi, was a nice guy. He could see I worked hard, and gave me a good grade, which he wrote down in a sort of passport each student had. But the Accademia wasn\\'t teaching me anything except Italian, and my money was running out, so at the end of the first year I went back to the US.\\n\\nI wanted to go back to RISD, but I was now broke and RISD was very expensive, so I decided to get a job for a year and then return to RISD the next fall. I got one at a company called Interleaf, which made software for creating documents. You mean like Microsoft Word? Exactly. That was how I learned that low end software tends to eat high end software. But Interleaf still had a few years to live yet. [5]\\n\\nInterleaf had done something pretty bold. Inspired by Emacs, they\\'d added a scripting language, and even made the scripting language a dialect of Lisp. Now they wanted a Lisp hacker to write things in it. This was the closest thing I\\'ve had to a normal job, and I hereby apologize to my boss and coworkers, because I was a bad employee. Their Lisp was the thinnest icing on a giant C cake, and since I didn\\'t know C and didn\\'t want to learn it, I never understood most of the software. Plus I was terribly irresponsible. This was back when a programming job meant showing up every day during certain working hours.', start_char_idx=14179, end_char_idx=18443, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'), score=1.0986518859863281)]"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "hybrid_retriever = index.as_retriever(\n",
    "    vector_store_query_mode=VectorStoreQueryMode.SEMANTIC_HYBRID\n",
    ")\n",
    "hybrid_retriever.retrieve(\"What is inception about?\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llamaindextest01",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
